A Framework for Data Cleaning in Data Warehouses
نویسنده
چکیده
It is a persistent challenge to achieve a high quality of data in data warehouses. Data cleaning is a crucial task for such a challenge. To deal with this challenge, a set of methods and tools has been developed. However, there are still at least two questions needed to be answered: How to improve the efficiency while performing data cleaning? How to improve the degree of automation when performing data cleaning? This paper challenges these two questions by presenting a novel framework, which provides an approach to managing data cleaning in data warehouses by focusing on the use of data quality dimensions, and decoupling a cleaning process into several sub-processes. Initial test run of the processes in the framework demonstrates that the approach presented is efficient and scalable for data cleaning in data warehouses.
منابع مشابه
Fuzzy multi-criteria selection procedures in choosing data source
Technology assessment and selection has a substantial impact on organizations procedures in regards to technology transfer. Technological decisions are usually made by a group of experts, and whereby integrity of these viewpoints to a single decision can be quite complex. Today, operational databases and data warehouses exist to manage and organize data with specific features and henceforth, th...
متن کاملData Cleaning: Problems and Current Approaches
We classify data quality problems that are addressed by data cleaning and provide an overview of the main solution approaches. Data cleaning is especially required when integrating heterogeneous data sources and should be addressed together with schema-related data transformations. In data warehouses, data cleaning is a major part of the so-called ETL process. We also discuss current tool suppo...
متن کاملIssues for On-Line Analytical Mining of Data Warehouses
Data warehouses and OLAP engines are expected to be widely available in the near future. The data in data warehouses has been cleansed, integrated, and preprocessed, and infrastructures have been built surrounding data warehouses for e cient data analysis. Therefore, data warehouses or OLAP databases are expected to be a major platform for data mining in the future. We discuss the issues relate...
متن کاملImproving Data Cleaning Quality Using a Data Lineage Facility
The problem of data cleaning, which consists of removing inconsistencies and errors from original data sets, is well known in the area of decision support systems and data warehouses. However, for some applications, existing ETL (Extraction Transformation Loading) and data cleaning tools for writing data cleaning programs are insufficient. One important challenge with them is the design of a da...
متن کاملOn Data Cleaning In Building XML Data Warehouses
One of the most important aspects in building an XML data warehouse is data cleaning and integration process. This paper presents a detailed methodology for cleaning data and integrating, especially useful for general situations when different-source documents are involved. Both situations whereby the XML documents have an associated XML Schema or they are just independent XML documents are con...
متن کامل